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Abstract 

We derive asymptotic expansions up to order rT 1 ! 2 for the nonnull distribution functions of 
the likelihood ratio, Wald, score and gradient test statistics in the class of dispersion models, under 
a sequence of Pitman alternatives. The asymptotic distributions of these statistics are obtained for 
testing a subset of regression parameters and for testing the precision parameter. Based on these 
nonnull asymptotic expansions it is shown that there is no uniform superiority of one test with re- 
spect to the others for testing a subset of regression parameters. Furthermore, in order to compare 
the finite-sample performance of these tests in this class of models, Monte Carlo simulations are 
presented. An empirical application to a real data set is considered for illustrative purposes. 
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1 Introduction 



The paper by iNelder and Wedderburnl(|1972|) introduced the class of generalised linear models (GLMs) 
and showed that a large variety of non-normal data may be analysed by a s imple general technique 



(see, for example, iMcCullagh and Nelderl [1.989; 



Dobson and Barnet 



2008). The GLMs were orig- 



inally developed for the exponential family of distributions, but the main ideas were extended to a 
wider class of models called dispersion models (DMs) in such a way t hat mo st of their good properties 
were preserved. T his class of models was introduced by UOrgensenl (11987ah and studied in details in 
Oreensen ( 1997a ). Some rece nt references about D Ms are Kokonendji et al. ( 2004 ). |jQrgensen et al 



(120101) . 



Simas et al. 



(12010 ) and 



Rocha et al. 



(120101) . 

The class of DMs with position parameter 9 (which vary in an interval of the real line) and preci- 
sion parameter <p > has probability density function of the form 



ir(y, 0, 0) = exp{0%, 9) + c(y, </>)}, 



(1) 
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where t(-, •) and c(-, ■) are known functions. If Y is continuous, 7r is assumed to be a density with 
respect to the Lebesgue measure, while if Y is discrete, ir is assumed to be a density with respect 
to the counting measure. The parameter 6 may be generally interpreted as a kind of location param- 
eter, not necessari ly the me an of the distribution. Several models of the form (OQ) are discussed by 



JOrgensenl (|1987aUbl Il997a|) . who also examined their statistical properties. It is evident that some 



special cases arise from ©. Exponential dispersion mode ls (EDMs) r eprese nt a special case of DMs 
with t(y,6) = yd — b{6), where E(F) = db(6)/d6; see IjQrgensen ( 1992 ). An important subclass 
of DMs of special interest, called proper dispersion models (PDMs), arise when c{y, </>) is additive 
i.e. c(y , 4>) = ai(y) + a 2 (</»), where ai(-) and a 2 (-) are known functions (see, for instance. UOrgensen 



1997bl) . The class of PDMs covers important distributions which are not cove red by the EDMs, such 



as the log-gamma distribution, the McCullagh distribution (IMcCullaghl . 1 1 9 89|) . the reciprocal inverse 
Gauss ian distribution and the simplex di s tributi on, which is suitable for modeling continuous propor- 
tions (|Barndorff-Nielsen and JOrgenseni Il99l|) . The von Mises distribution, which also belongs to 
the class of PDMs and doe s not b elong to the EDMs, is particularly useful for the analysis of circular 
data; see iMardia and Juppl (120001) . The PDMs have two important general properties. First, the dis- 
tribution of the statistic T = t(Y, 9) does not depend on 6 when <p is known, that is, T is a pivotal 
quantity for 6. Second, (OQ) is an exponential family with canonical statistic T when 9 is known. 

Large-sample tests, such as the likelihood ratio, Wald and Rao score tests, are usually employed 
for testing hypotheses in parame tric models. A new criterion for testing hypotheses, referred to as the 
gradient test, was proposed in iTerrell ( 2002 ). Its statistic is very si mple t o com pute when compared 
with the other three classic statistics. Here, it is worthwhile to quote Raol (|2005|) : "The suggestion by 
Terrell is attractive as it is simple to compute. It would be of interest to investigate the performance 
of the [gradient] statistic." Also, Terrell's statistic shares the same first order asymptotic properties 
with the likelihood ratio, Wald and score statistics. That is, to the first order of approximation, the 
likelihood ratio, Wald, score and gradient statistics have the same asymptotic distributional properties 
either under the null hypothesis or under a sequence of Pitman alternatives, i.e. a sequence of local 
alternatives that shrink to the null hypothesis at a convergence rate rT x l 2 . Additionally, it is known 
that, up to an error of order n _1 , the likelihood ratio, Wald, score and gradient tests have the same 
size properties but their local powers differ in the rr l l 2 term. Therefore, a meaningful comparison 
among the criteria can be performed by comparing the nonnull asymptotic expansions to order rr x l 2 
of the distribution functions of these statistics under a sequence of Pitman alternatives. 

In this paper, our main objective is to derive nonnull asymptotic expansions to order rr 1 ! 2 of the 
distribution functions of the likelihood ratio, Wald, score and gradient statistics under a sequence of 
local alternatives and to compare the local power of the corresponding tests in the class of DMs. In 
order to compare the finite-sample performance of these tests in this class of models we also perform 
a Monte Carlo simulation study. As far as we know, there is no mention in the statistical literature on 
the use of the gradient test in DMs. 

The nonnull asymptotic expansions up to order rT 1 ! 2 for the distribution functions of the likeli- 
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hood ratio and Wald statistics were derived by 
score statistic was obtained by 



Harris and Peers 



Havakawal (|1975r) . while an analogous result for the 
1980|) . The asym ptotic expansion up to order n 1 / 2 
for the distribution functions of the gradient statistic was derived by lLemonte and Ferraril(|2010l) . The 
expansions are very general, although being difficult or even impossible to particularize their formu- 
las for specific regression models. As we shall see below, we have been capable to apply their results 
for DMs. In particular, we derive closed-form expressions for the coefficients that define the nonnull 
asymptotic expansions of these statistics in this class of models and show that there is no uniform 
superiority of one test with respect to the others for testing a subset of regression parameters. 

The rest of the paper is organized as follows. Section |2]briefly describes the likelihood ratio, Wald, 
score and gradient tests. We present the class of DMs in Section|3] In Section|4]we derive the nonnull 
asymptotic expansions of the likelihood ratio, Wald, score and gradient statistics for testing a subset of 
regression parameters in DMs. The local power of the likelihood ratio, Wald, score and gradient tests 
are compared in Section |5] In Section |6] we consider hypothesis testing on the precision parameter. 
Monte Carlo simulation results are addressed in Section [7] We consider an empirical application in 
Section [8]for illustrative purposes. Section |9]closes the paper with some concluding remarks. 



2 Background 

Let t(0), Ug and Kg denote the total log-likelihood function, the score function and the information 
matrix for the parameter vector 6 = (6i, . . . , 9k) T of dimension k, respectively. Let Kq 1 denote the 
inverse of Kg. Consider the partition = (Oj , 0j) T , where the dimensions of Q\ and 2 are q and 
k — q, respectively. Suppose the interest lies in testing the composite null hypothesis Ho : 2 = 20 
against Hi : 6 2 ^ 9 2 o, where 2O is a specified vector. Hence, Oi acts as a vector of nuisance 
parameters. The likelihood ratio (Si), Wald (S 2 ), score (S 3 ) and gradient (5 4 ) statistics for testing H 
versus Hi are given, respectively, by 

Si = 2{£(0) - £{d)}, S 2 = (0-d) T K e (d- 0), 



S» = UjK a l U, 



S^UjiO-O), 



where 6 = (0^,0j) T and 6 = (0j,6 



e 

J ) T denote the maximum likelihood estimators of 6 



(0T,0 2 T )T under u x and respectively, K e = K e (0), K e = K 9 (0) and U e = U (6). The 
limiting distribution of Si, S 2 , S 3 and S^ is xt- q under Ho and Xk-q,\> i- e - a noncentral chi-square 
distribution with k — q degrees of freedom and an appropriate noncentrality parameter A, under Hi. 
The null hypothesis is rejected for a given nominal level, 7 say, if the test statistic exceeds the upper 
100(1 — 7)% quantile of the xl~ q distribution. 

From the partition of 0, we have the corresponding partitions 



Ug 



Ul,Ug 2j 



T \T 



Kn K X2 
K 2 i K 22 



Kg' 



K 1 K 1 - 

R 21 R 22 
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Thus, the statistics S 2 , S3 and S4 can be rewritten as 



S-2 



(02 







20 j 



Uj 2 K 22 Ue 2 , 



S 4 



020), 



where K 22 = K 22 (0), K 22 = K 22 {6) and C/ 02 = U 02 {0). 

Noticed that S4 has a very simple for m and does not involve the information matrix, neither 
expected nor observed, unlike S 2 and S3. iTerrelll (120021) points out that the gradient statistic "is not 
transparently non-negative, even though it must be so asymptotically." His Theorem 2 implies that if 
the log-likeli hood function is conc a ve an d is differentiable at 6, then S4 > 0. 



Recently, 



Lemonte and Ferrari 



(|2011|) obtained the nonnull asymptotic expansions of the likeli- 

hood ratio, Wald, score and gradient statistics in Birnbaum-Saunders regression models (|Rieck and Nedelman , 
19911) . An interesting finding is that, up to an error of order n _1 , the four tests have the same local 
power in this class of models. Their simulation study evidenced that the score and the gradient tests 
perform better than the likelihood ratio and Wald tests in small and moderate-sized samples and hence 
they concluded that the gradient test is an appealing alternative to the three classic asymptotic tests in 
Birnbaum-Saunders regressions. 



3 Dispersion models 

We assume that the random variables yi, . . . , y n are independent and each yi has a probability density 
function of the form 

Tr(yr,0 h (f)) = expi&iyt^^ + c(y h (f))}, l = l,...,n. (2) 

The mean of Y\ will be denoted by Hi, and is not necessary equal to 9i, the parameter of interest. In 
order to introduce a regression structure in the class of models in ©, we assume that 

d(9i)=ri l = f(x l ;P), 1 = 1,... ,n, (3) 

where d(-) is a known one-to-one differentiable link function, x\ = (xa, . . . , xi m ) T is an m-vector 
of nonstocastic variables associated with the Z-th response, (3 = . . . , (3 P ) T is a set of unknown 
parameters to be estimated (m < p < n), and /(•;•) is a possible nonlinear twice continuous differ- 
enciable function with respect to (3. The regression structure links the covariates xi to the parameter 
of interest Q[. The n x p matrix of derivatives of 77 = (771, . . . , r] n ) T with respect to (3, specified by 
X* = drj/d(3 T , is assumed to be of full rank, i.e. rank(X*) = p for all (3. Further, it is assumed that 
the precision parameter is unknown and it is the same for all observations. It is also assumed that the 
usual regularity conditio ns for maximum likelihood estimation and large sample inference hold; see 



Cox and Hinklevl (1 1974 Ch. 9). 



The class of regr ession models defined by © and © extends the class of generalised linear 



models discussed by McCullagh and Nelden (|1989l) in two directions. First and as noted before, it 
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includes important distributions which are not exponential family models. Second, it allows for a 
nonlinear structure in ij. The class of models in ©-© i s also a natura l extension of the exponential 
family nonlinear models (EFNLMs) introduced by Cordeiro and Paula ( 19891) . which in turn extends 
the well-known GLMs by allowing the regression structure to be nonlinear. The EFNLMs are defined 
by equations © and ®, with t(y h 9{) = y t 9 t - b{9{) and c(y h 0) = ai(yi) + a 2 (0) in ©• 

Let I = £({3, 4>) = J2^ =1 {<pt(yi, 9{) + c{y u 0)} be the total log-likelihood function for (3 and 0, 
where 0, is related to (3 by ©. We define D u = D u (9 h <p) = E{d i t(Y l ,(j))/d9 i l }, for i = 1,2,3 
and / = 1, . . . , n. From regularity conditions we have that Du = 0, for I = 1, . . . , n. Table \T\ 
lists D 2 i and D 3i for some dispersion models. The total score function and the total Fisher infor- 
mation matrix for (3 are given, respectively, by Up = 4>X* T t and Kp = <pX* T WX*, where 
i = i(y,0) = (i u ...,i n ) T is an n x 1 vector with i t = dt(y h 9 t ) / d9 u y = (y u . . . , y n ) T , 
= (9i,...,9 n ) T and W = diag{wi, . . . , w n } with wi = —D 2 i(d9i/dr]i) 2 . A simple calcu- 
lation shows that ~E( d 2 £/d0d(fi) = and then the parameters (3 and are globally orthogonal 
JcoxandReidl . ll987l) . Let on = ^^{9^(^,0)/^} = £" =1 E{c^(Y h 0)}, for % = 1,2,3. 
The derivatives of the ctj's with respect to are written with primes, i.e. a' ;i = daj/d0 and so on. We 
have that the joint information matrix for (/3 T , 0) T is given by di&g{Kp, — a 2 }. 



Table 1: Expressions of D 2 i and D^i (I — 1, ... ,n) for some dispersion models. ^ 



Model 


D 2 i 


D 3t 


Normal 


-1 





Inverse Gaussian 


-(-2^)" 3/2 


-3(-2^r 5/2 


Reciprocal inverse Gaussian 


-1/01 





Gamma 




2/8? 


Reciprocal gamma 


-ye? 


2/ef 


Log-gamma 


-i 


i 


von Mises 


-h{^)/h{4>) 





generalised hyperbolic secant 


2/(0f + l) 3 


(20f + l06i)/(6? + l) 3 



'Ij ((f)) is the modified Bessel function of the first kind and order j. 



The maximum likelihood est imate (M 



reweighted least squares method (|J0rgensen 



E) (3 of (3 can be obtained iteratively using standard 



1983 



1984): 



X *( m ) T |y( m )^* (m ' ) /3 (m+1 ) = X*^ 1 W^' ^y*( m \ m = 1 

where y*^ = X*^ f3 <jn > + N^H^' is an adjusted dependent variable and A?" is a diagonal matrix 
given by N = — diagjD^^d^i/dr/i) -1 , . . . , D 2l l(d9 n /drj n )~ 1 }. The estimate (3 depends directly on 
the distribution only through the function D 2 i and does not depend on the parameter 0. The maximum 
likelihood estimate of is the solution of 



J2{t(yJi) + c (1 \yiA)} = 0- 



(4) 



i=i 
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The maximum likelihood estimators (3 and are asymptotically independent due to their asymptotic 
normality and the block diagonal structure of the joint information matrix. If the model is a PDM the 
ccj's can be expressed as functions of only, namely oti 



na 



(0 



(<f)) for i = 1, 2, 3, where a 2 ((f)) is 
the i-th derivative of 02(0) with respect to 0. In this case, the (p + l,p + l)-th element of the joint 



information matrix is simply 



-na 



(2) 



((f)) and equation © reduces to a 2 ((f)) = — Ym=i KVh 9i)/ n - 



In what follows, we shall consider tests based on the likelihood ratio (Si), Wald (S 2 ), Rao score 
(5*3) and gradient (S4) statistics in the class of DMs for testing a composite null hypothesis T-L : 
02 = 020- This hypothesis will be tested against the alternative hypothesis Hi : (3 2 7^ P20, where 
(3 is partitioned as (3 = (f3j , (3j) T , with f3 x = (Pi,..., /3 q ) T and (3 2 = (P q +i, ■ /3 P ) T . Here, f3 20 
is a fixed column vector of dimension p — q. The partition of the parameter vector /3 induces the 
corresponding partitions Up = (Ul , Ul) T , with Up 1 = (fiX^i and Up 2 = 0X| T £, 



K, 





K/312 


= 


K/321 


Kf322 





xfwxi xfwx* 
xfwxi xfwx* 



with the matrix X* partitioned as X* = [X* X|] , X* being n x q and X| being n x (p — q). Let 
(f3i,(3 2 , 4>) an d (/3i, /320; 0) be the unrestricted and restricted MLEs of (/3 l5 /3 2 , 0), respectively. The 
likelihood ratio, Wald, score and gradient statistics for testing W, can be expressed, respectively, as 



Si = 2{l(J3 1 ,p 2 ,$) -l((3i,(3 20 ,(j))}, S 2 = 0(A> 
S* = s T W l/2 X*(R T WRY l XfW ll2 s, Sa 



f3 20 ) T (R T WR)(f3 2 ~(3 20 ) 
^8 r W^X*0 2 -p 2o ), 



53 — » rr • ^v 2 l JX rv ■ cx ') -^2 ry a ' °4 

where s = (s u . . . , s n ) T with Sl = (f) 1 / 2 i l (-D 2 i)- 1/2 and R = X * - X^XfWX^X^WX*. 
Here, tildes and hats indicate evaluation at the restricted and unrestricted MLEs, respectively. The 
limiting distribution of all these statistics under H is Xp- q - Note that, unlike the Wald and score 
statistics, the gradient statistic does not involve any matrix inversion. 



4 Nonnull asymptotic distributions in DMs 

We present in this section expressions for the nonnull asymptotic expansions up to order rT x l 2 for 
the nonnull distribution of the likelihood ratio, Wald, score and gradient statistics for testing a sub- 
set of regression param eters in DMs. It should be mentio ned that the general nonnull asymptotic 



expansions derived in lHayakawal (|1975|) . 



Harris and Peers 



(1980) and 



Lemonte and Ferrari 



(2010) 



were developed for continuous distributions. It implies that the results derived in this section are 
only valid for continuous DMs. Here, we shall assume the following local alternative hypothesis 



Ui n : (3 2 = f3 20 + e, where e = (e q+1 , . . . 
We introduce the following quantities: 

K mi K Pi2 



with e r = 0(n l / 2 ) for r — q + 1, 



■P- 



l p-q 



M = K 



(3 
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where I p _ q is a (p — q) x (p - g) identity matrix. Additionally, let Z = X*(X* T WX*) 1 X* T 
{z lm }, Z l = X^XfWX^X^ = {z llm }, 



r d\ \ 


■^111 


■^121 


\d/3 r d/3 s j 


■^■211 


^22l_ 



, r,s = l,...,p, Z = l,...,n, 



Z d = diag{zn, . . . , z nn }, Z ld = diag{z m , . . . , z lnn }, F = diag{/i, . . . , /„}, G = diag{#i, . . . , g n }, 
E = diag{ei, . . . , e n }, t = (h, . . . , t n ) T = X*e*, b = (&i, . . . , b n ) T = X 2 *e, T = diag{*i, . . . , t n }, 
T {2) = t QT, = T and B = diag{6i, . . . , b n }, where "0" denotes the Hadamard 
(direct) product of matrices, and 



fi = - A i a ^a- r Dai > 9i — — -t~2~ 2/ ' e i = ~\A~] D w 1 = 1, 
drjtdrjf \ d W dry, dr/f V d W 

where D' 2l denotes the first derivative of D 2 i with respect to Oi, for I — 1, . . . , n. 

The nonnull distributions of Si, S 2 , S 3 and S 4 under Pitman alternatives for testing Hq: (3 2 = (3 2 o 
in DMs can be expressed as 

3 

Pr(Si <x) = G p _ qyX (x) + b ik G p - q+ 2k,\(x) + O^ 1 ), i = 1, 2, 3, 4, 

fc=0 

where G m: \{x) is the cumulative distribution function of a non-central chi-square variate with m 
degrees of freedom and non-centrality parameter A. Here, A = 4>iv{K 22 .iee T } /2, where K 22A = 
Kp 22 — Kp^Kp^Kp^ and tr(-) denotes the trace operator. The coefficients 6 ife 's (i = 1, 2, 3, 4 and 
fc = 0, 1, 2, 3) can be written in matrix notation, after extensive algebra, as 

b u = ^tr{(F + 2G)BT {2) + (2E - F + 2G)T (3) + WT(C + 2P)} 
2 

+ itr{(2£ -F + 2G)Z ld T + WJT}, 

bn = -^tv{(3E -2F + 2G)T^}, b 13 = 0, 

b 21 = ^tr{(F + 2G)BT {2) + {2E - F + 2G)T (3) + WT(C + 2P)} 

+ itr{(2£ - F + 2G)Z d T + 2(F - E)(Z d - Z ld )T + W(UT + 21?)}, 

b 22 = ^tr{(F - F)T^ + WTC} - hr{(F + 2G)(Z d - Z ld )T + WT(C7 - J) + 2Wff }, 

& 23 = -^tr{(F + 2G)T® + 3VTTC}, 
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63! = %{(E + 2G)BT {2) + (2E — F + 2G)T (3) + WT(C + 2P)} 

+ Ux{{2E -F + 2G)Z ld T + (3E - 2F + 2G){Z d - Z ld )T + WTJ}, 

b 3 2 = -^tr{(3£ - 2F + 2G)(Z d - Z W )T}, 6 33 = ~^{(3E - 2F + 2G)T( 3 >}, 

6 41 = ^tr{(£ + 2G)BT {2) + (2E - F + 2G)T (3) + WT(C + 2P)} 

+ ^tr{(6G - F + AE)Z ld T - (F + 2G)Z d T + WT{3J - U) - 2WH}, 



b 42 = -j^r{(2E -F + 2G)T (3) + WTC} 

+ hr{(F + 2G)(Z d - Z ld )T + WT(U - J) + 2WH}, 

b 43 = ^tr{(F + 2G)T^ + 3WTC}, 

where U = diag{«i, . . . ,u n } with u { = tr{Xf(X* T W X*)^ 1 }, J = diag{ji, . . . , j n } with ji = 
triX^XfWX*)- 1 }, C = diag{ci, . . . , c n } with c, = tr{X; e *e* T }, P = diag{ Pl , ...,p n } with 
p, = tr{X*e*S T }, H = diag{/ii, . . . , with h t = <ptr{MX;e*x* T }, S T = (0 T , e T ) and xf is 
the Zth line of X*. The coefficients 6j are obtained from b i0 = — (ba + bi2 + b i3 ), for z = 1,2, 3, 4. 
The &j fc 's are of order n -1 / 2 and all quantities except e are evaluated under the null hypothesis H . 
The detailed derivation of these expressions is long and extremely tedious but may be obtained from 
the authors upon request. 

It is interesting to note that the fe^'s are functions of the local derivative matrix and of the (pos- 
sibly unknown) precision parameter. These coefficients depend on the second derivative of the (pos- 
sibly nonlinear) function f(xf, f3) and involve the link function and its first and second derivatives. 
Unfortunately, they are very difficult to interpret. The matrices C, H, J, P and U may be consid- 
ered the nonlinear contribution of the dispersion model since they vanish if the regression model is 
linear. Obviously, these coefficients depend heavily on the particular dispersion model under con- 
sideration. In particular, these coefficients do not change for the class of PDMs, since the only dif- 
ference between PDMs and DMs is the form of the function c(-, •), which can be decomposed as 
c(y, 0) = ai(y) + a 2 (0) for PDMs. By replacing E by F — G in these coeffici ents, we obtain the 



nonnull asymptotic distributions of the four statistics in the class of EFNLMs (see iLemonteL 120 1 lh . 

Some simplifications in the coefficients b^ (i = 1,2,3,4 and k = 0,1,2,3) can be achieved 
by examining special cases. For example, consider the null hypothesis H : f3 = /3 (i.e. q = 0) 
and an identity link function (d(9i) = 9i), which implies that fi = —D 3l , gi = and ei = —D' 2l 
(I = 1, . . . , n). Therefore, the fe^'s can be written as 

b n = -triEBT® + (2E - F)T (3) + WT{C + 2P)} + ^tr{ WJT}, 
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6i2 = h 3 = -^tr{(3F - 2F)T^}, b 13 = 0, 632 = ~H(3E - 2F)Z d T}, 
b 21 = -tr{EBT {2) + (2E - F)T {3) + WT(C + 2P)} 

Z 

+ ^tr{FZ d T + WXE/T + 2Jf)}, 

6 22 = ^tr{(F - F)T (3) + WTC} - \\i{FZ d T + WT(C7 - J) + 2W£T}, 
hs = -2643 = -^tr{FT( 3 ) + 3WTC}, 



hi = 



-ix{EBT {2) + (2E - F)T (3) + WT{C + 2P)} 
2 

+ -tr{(3F - 2F)Z d T + WTJ}, 

Z 



641 = -tr{FBT^ + (2F - F)T (3) + WT(C + 2P)} 
+ ^tr{-FZ d T + WT(3 J - E7) - 2W/f }, 

b 42 = -^tr{(2F - F)T<® + WTC} + Ux{FZ d T + WT(U - J) + 2Wif }, 

and 6j = —(fcii + b i2 + 6*3), for % — 1, 2, 3, 4. For the log-gamma model, the above coefficients reduce 
to 

fen = ^tr{-FT^ + WT[C + 2P)} + kr{WJT}, b 12 = b 33 = ^tr{ FT^}, b 13 = 0, 
b 21 = ^tr{-FT {3) + WT(C + 2P)} + itr{FZ d T + W(UT + 2H)}, 

Z Z 

b 22 = ^tr{PT (3) + WTC} - -tr{FZ d T + WT(U - J) + 2WH}, 

z z 

b 23 = -2643 = -^tr{PT( 3 ) + 3WTC}, b 32 = tr{FZ d T}, 

b 31 = |tr{- FT® + WT{C + 2P)} + itr{-2FZ d T + WTJ}, 
z\ z\ 

b 41 = ^tr{-FT< 3 > + WT(C + 2P)} + itr{-FZ d T + WT(3J - U) - 2WH} : 

z\ 4 

b 42 = -|tr{ -FT {3) + WTC} + itr{FZ d T + WT(U - J) + 2Wif}, 
Also, for the von Mises model we have 

hi = b 31 = ^tr{ WT(C + 2P)} + hr{WJT}, b 12 = b 13 = b 32 = b 33 = 0, 



b 21 = ^tr{WT(C + 2P)} + hr{W(UT + 2H)}, b 23 = -26 43 = -|tr{ WTC}, 

b 22 = -26 42 = ^tr{WTC} - Ux{WT(U - J) + 2WH}, 

6 41 = ^tr{WT(C + 2P)} + itr{WT(3J - 17) - 2Wif}, 
Note that for the von Mises linear regression model, the fr^'s above vanish and hence we can write 

Pr(^ <x) = G PiX (x) + 0(n- 1 ), i = 1, 2, 3, 4. 

This is a very interesting result, which implies that the likelihood ratio, score, Wald and gradient tests 
for testing the null hypothesis Ho : (3 = (3o have exactly the same local power up to an error of order 
rT 1 when we consider an identity link function. It should be noticed that this result also happens for 
testing the composite null hypothesis Ho '■ fii = P20, i-e Pr(S , i < x) — G p - q: \(x) + 0(n~ l ), for 
i = 1,2,3,4. 

Now, we present the coefficients that define the nonnull asymptotic distributions of the likelihood 
ratio, Wald, score and gradient statistics for testing the composite null hypothesis Ho '■ 2 = 020 
in GLMs. We have t(y h 9i) = yA - 6(0j) and \n = E(y,) = db(9i)/d9i. The class of GLMs 
is characterized by its variance function Vi = d^/dOi, which plays a key role in the study of its 
mathematical properties and estimation. The variance of Y\ can be written as var(Yz) = _1 V;. For 
the GLMs, we have D 2 i = — Vp 1 and D 3 i = 2Vf 1 (dVi/dfj,i) and hence we can rewrite 

h v.d^d^r 91 v^ttf v?dn\drnJ' '"'' n ' 

and redefine the matrices F and G given before. Additionally, the link function is d(/j,i) = rji = xj (3 
with m = p. Also, 77 = X/3 with X = (cci, . . . , a;„) T , i.e. here X* = X. Hence, in this class of 
models we have 

b u = tr{(F + G)BT& + FT^} + hr{FZ ld T}, b 12 = b 33 = ~^{(F - G)T< 3 >}, 

621 = ^tr{(F + G)BT( 2 ' + FT< 3 )} + \i{FZ d T + 2G(Z d - Z ld )T}, 
b 22 = ^tr{GT®} - itr{(F + 2G)(Z d - Z ld )T}, b 13 = 0, 
623 = -2643 = -^tr{(F + 2G)T( 3 )}, 632 = -^tr{(F - G)(Z d - Z ld )T}, 
631 = ^tr{(F + G)BT^ + FT^} + itr{FZ 1(i T + (F — G)(Z d - Z ld )T}, 
641 = ^tr{(F + G)BT^ + FT<®} + ^tr{(3F + 2G)Z ld T — (F + 2G)Z d T}, 
642 = -^tr{FT( 3 )} + itr{(F + 2G)(Z d - Z ld )T}, 
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By considering the identity link function, these coefficients reduce to 

b u = ^tr{GBT^}, b 12 = 6 33 = |tr{GT( 3 )}, 632 = h 2 = ^tr{G(Z, - Z ld )T}, 
l b 2 

&13 = 0, 6 2 3 = -2643 = -2&12, 621 = &ii + 26 32 , b 22 = 36i2 - 26 32 , 631 = 641 = &11 - b 32 . 

As expected, the above coefficients vanish for the normal model since the nonnull distributions of all 
the four criteria agree with the xl- q ,\ distribution. 

5 Power comparisons 

It is known that, to the first order of approximation, the likelihood ratio, Wald, score and gradient 
statistics have the same asymptotic distributional properties either under the null hypothesis or under a 
sequence of local alternatives. On the other hand, up to an error of order n~ l the corresponding criteria 
have the same size properties but their local powers differ in the rr 1 ! 2 term. A meaningful comparison 
among the criteria can then be performed by comparing the nonnull asymptotic expansions to order 
n -1 / 2 , i.e. ignoring terms or order less than rT 1 ! 2 . 

In what follows, we shall compare the local powers of the rival tests based on the general nonnull 
asymptotic expansions derived in Section |4]for testing the null hypothesis Ho : f3 2 = {3 20 in the class 
of DMs. Let Ilj be the power function, up to order n -1 / 2 , of the test that uses the statistic Si, for 
% = 1,2,3,4. We have 

3 

Hi - Uj = ^(bjk - b ik )G p - q+2k , x (x), (5) 

A:=0 

for i ^ j. It is well known that 

G m ,x{ x ) ~ G m +2,\{x) = 2g m+2> \(x), (6) 

where g Uy \(x) is the probability density function of a non-central chi-square random variable with v 
degrees of freedom and non-centrality parameter A. From © and © we have after some algebra 

ilx - n 4 = hgp-g+^xix) + k 2 g p - q +Q,\(x), n 2 - n 4 = k 3 g p _ q+4: \(x) + k±g p - q+Gj \{x), 

n 3 - n 4 = k 5 g p _ q+ 4 t x(x) + k e g p - q +6,\(x), n x - n 2 = k 7 g p - q +4,\(x) + k 8 g p - q+ e,\(x) } (7) 

rii — n 3 = k 9 g p _ q+4:! x(x) + k 10 g p _ q+ s\ (x), Yi 2 — H3 = k u g p _ q+4: \(x) + ki 2 g p _ q+ Q\ (x), 

where 

h = ~ti{(F + 2G)(Z d - Z ld )T} + ~tr{ WT(J - U) - 2WH}, 

k 2 = -|tr{(F + 2G)T^} - |tr{ WTC}, k 3 = 3k h h = Sk 2 , 
o 2 

k 5 = k!- tr{(3£ -2F + 2G){Z d - Z ld )T}, 
k e = -^tr{(2£ — F + 2G)T®} - ^tr{WTC}, 
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k 7 = -2k u k 8 = -2k 2 , k 9 = h-k 5 , k l0 = ^tr{(3E-2F + 2G)T^}, 

fcn = -3tr{(F - E){Z d - Z ld )T} - tr{WT{U - J) + 2WH}, 

k 12 = -0tr{(F - E)T {3) } - 0tr{ WTC}. 

For proper dispersion models, the above expressions are the same. Replacing E by F — G we 
obtain these quantities for exponential family nonlinear models. From equations © we have 111 > n 3 
if k g > and k 10 > with k 9 + k 10 > 0, and if k 9 < and k 1Q < with k g + k 10 < 0, we have 
111 < n 3 . Also, 111 = n 3 if kg = &40 = 0, i.e. F = G and E = 0, which occurs only for von Mises 
and normal models with any link function. Additionally, equations © show that with the exception 
of the likelihood ratio and score tests, is not possible to have any other equality among the power 
functions in the class of DMs for testing the null hypothesis T-L : (3 2 = (3 2 o- The reason is that C, 

H, J and U, which may be considered as the nonlinear contribution of the dispersion model, vanish 
only for linear regression models. It implies that only strict inequality holds for any other power 
comparison among the power functions of the tests that are based on the statistics Si, S 2 , S3 and S4. 
For example, from © we have 111 > il 4 (111 < il 4 ) if ki > and k 2 > with ki + k 2 > (if ki < 
and k 2 < with ki + k 2 < 0), and so on. 

We now move to the class of GLMs, in which C = H = J = P = U = 0. By using the 
coefficients derived for this class of models in Section |U the quantities that define equation © reduce 
to 

ki = ~tx{(F + 2G){Z d - Zi d )T}, k 2 = -|tr{(F + 2G)T^}, k 3 = 3k u 
2 o 

h = kx- tr{(F - G)(Z d - Z ld )T}, k G = -^tr{FT^}, k 4 = 3fc 2 , 

k 7 = -2h, k 8 = -2k 2 , k 9 = h-h, k 10 = ^tr{(F-G)T^}, 

k u = -3tr{G(Z d - Z ld )T}, k 12 = -<f)tr{GT^}. 

For GLMs with canonical link (G = 0), we have k u = k 12 = and hence U 2 = il 3 . It is possible to 
show that 111 = n 2 = il 4 if F = -2G, that is 

drjf 3v\d Vl ) ' i '---' n " 
The GLMs for which this equality holds have the link function defined by T]i = J V^^dfii (I = 

— 1/3 

I, . . . , n). For the gamma model this function is rji = {i { (/ = 1, . . . , n). Additionally, we have 
that n 3 = n 4 for any GLM with identity link function, i.e. F = 0. Also, 111 = Il 3 if kg = k w = 0, 
i.e. F = G, which occurs only for normal models with any link. Finally, the equality Hi = U 2 = 
Il 3 = 1I4 holds only for normal models with identity link function. 

We can conclude that there is no uniform superiority of one test with respect to the others for 
testing the null hypothesis T-L : f3 2 = /3 20 in the class of DMs. Hence, if the sample size is large, all 
tests could be recommended, since their type I error probabilities do not significantly deviate from the 
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true nominal level and their local powers are approximately equal. The natural question is how these 
tests perform when the sample size is small or of moderate size, and which one is the most reliable. 
In Section |71 we shall use Monte Carlo simulations to shed some light on this issue. 



6 Tests for the precision parameter 



In this section we derive asymptotic expansions for the nonnull distribution of the four statistics for 
testing the precision parameter in DMs. We are interested in testing the null hypothesis 7i : = 4>o 
against a two-sided alternative hypothesis "Hi : ^ O , where O is a positive specified value for 0. 
Here, (3 acts as a nuisance parameter. The likelihood ratio, Wald, score and gradient statistics are 
expressed as follows: 

n 

Si = ^2{(<j> - </>o)t(yi,0t) + c(yt,<j)) - c(^,0 o )}, 3 2 = ($ - o ) 2 {-a 2 (0)}> 
i=i 



S 3 = {- a2 (0 o )}-i 



Si 



^)J2{t(yiA) + c {l) (yiAa)}. 



i=i 



J2{t(yiA) + c {l) (yi,d>o)} 
.i=i 

For PDMs, these statistics can be expressed as 

S l = 2n{a 2 (0) - a 2 (0 o ) - &- O )4 1) (0)}, S 2 = -n($ - <f) ) 2 a { 2 2) ($), 



S 4 = n{a$ 1) (0 o ) - 4 ij (0)H0 - 0o)- 



4 2) (0o) 



For example, for the von Mises model a 2 (0) = — log{io (</>)}■ Also, 4^(0) = —r(<f>) and 4 (0) 
r(0) 2 + r(0)/0 — I, where r(0) = ii(0)// o (0). Thus, we can write 

S l = 2n[log{/ o (0o)//o(0)} + (0 - 0o)r(0)], S 2 = -n(0 - o ) 2 {r(0) 2 + r(0)/0 - 1}, 
n{r(0 o ) - r(0)} 2 



S, 



S 4 = n{r(0) - r(0 o )}(0 - 0o) 



r(0 o ) 2 + r(0 o )/0 o -l 
Also, for normal and inverse Gaussian models we have a 2 ((j)) = log(0)/2. Hence 

Si = 2n{ log( -f ] ( ) J>. ,S', ,S', - f ' ' " " '"' 1 



Sa — — ■ 



n f - 0o 



y o/ \ </> / J 2 [ ^ J 2(^(^0 

We have a 2 (0) = log(0) — log{r(0)} for the gamma model and therefore these statistics reduce to 

' (0) 1 (0-0 o )(l_^(0)) 



5 i = 2 ^^olog( -1-log 



S 2 = n{0?//(0)-l}- 



S, 



n0 o {log(0/0 o )-(V>(0)-^(0 o ))} 
0o#(0o) - 1 
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and 



S 4 = n(0 - O )<! log( |- ) + m - V(0o) }, 



where r(-), ip(-) and *0'(-) are the gamma, digamma and trigamma functions, respectively. 

The nonnull asymptotic distributions of Si, S 2 , S3 and S4 for testing Ho '■ = 0o in DMs under 
the local alternative H ln : = O + e, where e = — O is assumed to be Ofn" 1 / 2 ), is 

3 

Pr(Si <x) = G ltX (x) + bikGi +2 k,x(x) + O^ 1 ), i = 1, 2, 3, 4. 

fc=0 

The noncentrality parameter is given by A = — a 2 e 2 and the the coefficients foj fc 's can be written as 

(c/ 2 - a 3 )e 3 , pe (2q 3 - 3a 2 )e 3 
611 = g + 20' 6l2 = 6 ' 6l3 = °' 

(«2 - « 3 )e 3 a 3 e . ^ , (« 2 - « 3 )e 3 . « 3 e a 3 e 3 

621 " 2 2cV 2 + W 622 " 2 + 2^' 623 T' 

_ («2 - « 3 )e 3 (2a 3 - 3a 2 )e (2a 3 - 3a 2 )e (2a 3 - 3a 2 )e 3 

031 - o 1 o H 7T7' °32 , 033 - , 

2 2a 2 20 2a 2 6 

(a' 2 — a 3 )e 3 a 3 e pe (2a' 2 — a 3 )e 3 a 3 e a 3 e 3 

641 = 2 + 4^ + 20' 642 = 4 4c^' 643 = T2"' 

with b i0 = —(foji + 6j 2 + 6j 3 ), for i = 1,2, 3, 4. It should be noticed that the above expressions depend 
on the parameter and depend on the local derivative matrix X* only through its rank p. Since 
a' 2 = a 3 = naf 1 (0) for PDMs, these coefficients reduce to 

r P e h h h naj> 3) (0)e 3 4 3) (0)e 

611 = &12 = &23 = &33 = « ' &13 = °' &21 = &31 = 7T7 ^TTT' 

20 6 20 2a 2 j (0) 

11 6 

&22 = b 3 2 = hi - hi, hi = hi + -(hi - hi), h 2 = -j^u ~~ & 2i - 36i 2 ), fo 4 3 = — p 

with 6j = —(foil + foi2 + fo?3)> for i — 1, 2, 3, 4. These coefficients do not change for the class of 
GLMs. 

In what follows, we present an analytical comparison among the local powers of the four tests for 
testing the null hypothesis H : = O . We have 

3 

IT - ri| = ^2(b jk - b ik )Gi +2 k,x{x). 
After some algebra, we can write 

III - n 2 = 55,a(^J + —z-97,x\x), 

Oi.2 O 

(2a 3 - 3a' 2 )e (2a 3 - 3a' 2 )e 3 
n x - n 3 = g 5tX (x) 97,x(x), 

&2 o 
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III - n 4 = - — g 5 ,\{x) —g 7! x(x), 

2a 2 6 

^ 3(ct3 — a 2 )e , , 3 

n 2 - n 3 = 9s,x{x) - («3 - OL 2 )e g 7 ,x{x), 

a 2 

n 2 - n 4 = - — ^.aw — ^s^aw, 

2a 2 2 

3(o 3 - 2a 2 )e (a-3 - 2a' 2 ) 3 

n 3 - n 4 = 9i,\\ x ) + o e 97,\{x). 

a 2 2 

From the above expressions, we can obtain the following general conclusions. By assuming > O 
(opposite inequalities hold if < O ), we have that II3 < n 2 < 111 < II4 if 03 > with a 2 > a^. 
Also, Il 2 = n 3 < ilx < n 4 if a' 2 = a 3 > 0. For example, for normal and inverse Gaussian models 
we have a 2 (0) = log(0)/2, which implies that a [ 2 \<p) = 1/(20), a [ 2 \<p) = -l/(20 2 ) and a { 2 \(p) = 
l/4> 3 . Since a 2 = = n/(p 3 > 0, we arrive at the following inequalities: Il 2 = II3 < 111 < Il 4 if 
<p > </>o, and Il 2 = n 3 > 111 > Il 4 if < 0o- 



7 Monte Carlo simulation 



In this section we conduct Monte Carlo simulations in order to compare the performance of the 
likelihood ratio, Wald, score and gradient tests in small- and moderate-sized samples. 

We con sider th e von Mises regression model, which is quite useful for modeling circular data; see 



Fisherl (119931) and Mardia and Juppl (120001) . Here, 

exp{0 cos(y — 9)} 



2vr/ o (0) 



y E (-7r,7r) 



where 9 e (— n, n) and > 0. This density function is symmetric around y = 9, which is the mode 
and the circular mean of the distribution. Also, is a precision parameter in the sense that the larger 
the value of the more concentrated the density function around 9. It is evident the density function 
above is a special case of (OQ) with t(y, 9) = cos(y — 9) and c(y, 0) = — log(/ o (0))- 
We assume that 

tan(0j/2) = r]i = f3 1 x il + (5 2 x i2 H h (3 p x ip , 

where xn = 1 and 9\ = 2 arctan^), I = 1, . . . , n. The covariate values were selected as random 
draws from the 14(0, 1) distribution and for fixed n those values were kept constant throughout the 
experiment. The number of Monte Carlo replications was 10,000, the nominal levels of the tests 
were 7 = 10%, 5% and 1%, a nd all simulations were carried out using the Ox matrix program- 
ming language (|DoornikL 120071) . Ox is freely distributed for academic purposes and available at 
http://www.doomik.com. 

First, the null hypothesis is T-L : = (3 P = 0, which is tested against a two-sided alternative. 
The sample size is n = 50, = 1.5, 2.5, 4 and p = 3, 4, . . . , 8. The values of the response were 
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generated using fii = ■ ■ ■ = /3 P _ 2 = 1. The null rejection rates of the four tests are presented in 
Table[TJ It is clear that the likelihood ratio (Si) and Wald (S 2 ) tests are markedly liberal, more so 
as the number of regressors increases. The score (S 3 ) and gradient (S4) tests are also liberal in most 
of the cases, but much less size distorted than the likelihood ratio and Wald tests in all cases. For 
instance, when <\> = 2.5, p = 4 and 7 = 5%, the rejection rates are 7.05% (Si), 8.28% (S 2 ), 5.15% 
(S 3 ) and 6.30% (S 4 ). We note that the score test is much less liberal than the likelihood ratio and 
Wald tests and slightly less liberal than the gradient test. Additionally, the Wald test is much more 
liberal than the other tests. Note that as <p increases the tests become less size distorted, as expected, 
since the von Mises distribution approaches a normal distribution as 4> increases. 

Table 2: Null rejection rates (%); = 1.5, 2.5 and 4, with n = 50. 



4> = 1.5 







7 = 


10% 






7 = 


5% 






7 = 


1% 




p 


Si 


5*2 


S3 


5 4 


Si 


S 2 


S3 


S 4 


Si 


S 2 


s 3 


S4, 


3 


13.31 


15.42 


10.12 


10.42 


6.90 


9.93 


4.65 


5.04 


1.75 


4.13 


0.79 


1.20 


4 


14.48 


16.31 


10.26 


12.49 


7.75 


10.86 


4.83 


6.83 


1.93 


4.62 


0.59 


2.08 


5 


16.65 


19.34 


10.92 


12.46 


9.55 


12.36 


5.05 


6.62 


2.67 


4.87 


0.84 


1.83 


6 


19.04 


21.93 


11.94 


14.81 


11.78 


15.00 


5.90 


8.26 


3.62 


6.50 


1.03 


2.40 


7 


22.09 


26.39 


12.44 


15.94 


13.71 


18.12 


6.12 


8.87 


4.27 


7.67 


1.27 


2.21 


8 


24.16 


26.58 


13.03 


17.66 


15.87 


17.42 


6.63 


9.82 


5.23 


6.82 


1.39 


2.76 














4> = 2.5 
















7 = 


10% 






7 = 


5% 






7 = 


1% 




P 


Si 


S 2 


s 3 


Sa 


Si 


S 2 


S3 


Si 


Si 


S 2 


S3 


Si 


3 


12.02 


12.96 


10.56 


10.50 


6.21 


7.35 


5.17 


5.29 


1.39 


2.31 


0.78 


1.04 


4 


12.97 


13.66 


11.05 


11.77 


7.05 


8.28 


5.15 


6.30 


1.73 


3.05 


0.90 


1.52 


5 


14.28 


16.38 


10.97 


11.68 


7.96 


10.31 


4.94 


6.25 


2.11 


4.28 


0.85 


1.65 


6 


14.83 


15.33 


11.90 


13.02 


8.36 


9.82 


5.71 


7.27 


2.09 


3.85 


1.01 


1.80 


7 


15.93 


18.00 


12.60 


13.87 


9.20 


11.30 


6.66 


7.60 


2.72 


3.71 


1.53 


1.87 


8 


18.12 


19.53 


13.45 


16.12 


11.16 


12.29 


7.02 


9.38 


3.31 


4.79 


1.55 


2.68 














cj) = 4 


















7 = 


10% 






7 = 


5% 






7 = 


1% 




P 


Si 


s 2 


S3 


S 4 


Si 


s 2 


S3 


S4 


Si 


s 2 


S3 


Si 


3 


11.99 


12.59 


10.72 


10.81 


6.32 


7.19 


5.02 


5.25 


1.37 


2.20 


0.82 


1.12 


4 


13.15 


14.48 


11.49 


11.74 


7.19 


8.66 


5.50 


5.83 


1.67 


2.89 


0.84 


1.13 


5 


13.59 


13.67 


11.87 


12.26 


7.21 


7.64 


5.72 


6.25 


1.68 


2.50 


0.96 


1.35 


6 


14.08 


15.60 


11.85 


12.65 


7.57 


9.04 


5.88 


6.30 


1.73 


2.88 


1.00 


1.21 


7 


15.16 


16.42 


12.79 


13.52 


8.34 


9.55 


6.42 


7.03 


2.28 


3.16 


1.43 


1.71 


8 


16.14 


17.36 


13.53 


14.57 


9.28 


10.31 


7.13 


7.84 


2.42 


2.96 


1.28 


1.61 



Table |3]reports results for = 3, p = 4 and sample sizes ranging from 20 to 150. As expected, the 
null rejection rates of all the tests approach the corresponding nominal levels as the sample size grows. 
Again, the score and gradient tests present the best performances. In Table |4] we present the first two 
moments of Si, S 2 , S 3 and S 4 and the corresponding moments of the limiting x 2 distribution. Note 
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that the gradient and score statistics present a good agreement between the true moments (obtained 
by simulation) and the moments of the limiting distribution. 



Table 3: Null rejection rates (%); <p = 3, p = 4 and different sample sizes. 







7 = 


10% 






7 = 


5% 






7 = 


1% 




n 


Si 


s 2 


s 3 




Si 


s 2 


s 3 


S4 


Si 


S 2 


S3 




20 


17.33 


19.18 


13.71 


13.89 


10.50 


11.95 


6.92 


7.04 


3.33 


4.38 


1.16 


1.14 


30 


15.04 


16.33 


11.65 


12.76 


8.29 


10.19 


5.10 


6.66 


2.05 


4.14 


0.75 


1.50 


40 


13.49 


15.23 


11.44 


11.44 


7.56 


9.43 


5.72 


5.96 


1.81 


3.07 


0.92 


1.18 


50 


12.51 


13.78 


10.77 


11.05 


6.65 


7.79 


5.40 


5.59 


1.66 


2.31 


1.02 


1.25 


70 


12.01 


12.46 


11.00 


11.17 


6.20 


6.90 


5.41 


5.58 


1.48 


2.18 


1.12 


1.28 


100 


11.30 


12.13 


10.74 


10.69 


5.86 


6.65 


4.92 


5.44 


1.22 


2.04 


0.94 


1.07 


150 


10.51 


11.01 


10.02 


10.10 


5.05 


6.03 


4.59 


4.63 


1.08 


1.66 


0.94 


0.95 



Table 4: Moments; <p = 2, n = 35, p = 4. 





Si 


S2 


S3 


Si 


xl 


Mean 


2.50 


2.68 


2.16 


2.23 


2.0 


Variance 


6.23 


8.73 


4.14 


4.63 


4.0 



We also performed Monte Carlo simulations considering hypothesis testing on <p. To save space, 
the results are not shown. The score and gradient tests exhibited superior behaviour than the likelihood 
ratio and Wald tests. For example, when n = 35, p = 3, 7 = 10% and Hq : = 2, we obtained the 
following null rejection rates: 13.23% (£1), 14.75% (S 2 ), 10.61% (S3) and 9.97% (S 4 ). Again, the 
best performing tests are the score and gradient tests. 

Overall, in small to moderate- sized samples the best performing tests are the score and the gradient 
tests. They are less size distorted than the other two. Hence, these tests may be recommended for 
testing hypotheses on the regression parameters in the von Mises regression model. The gradient test 
has a slight advantage over the score test because the gradient statistic is simpler to calculate than 
the score statistic for testing a subset of regression parameters. In particular, no matrix needs to be 
inverted; see Section |3] 



8 Application 



In this section we shall illustrate an application of the likelihood ratio, Wald, score and gradient tests 
in a real data set. We consider the data described in basher and Lee (1992) regarding the distance 
traveled by 31 small blue periwinkles {Nodilittorina u nifasciata) after they h ave moved down- shore 
from the height at which they normally live. Following Fisher and Lee ( 19921) we assume a von Mises 
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distribution for the animals' path, but with the assumption of constant dispersion and link function 



tan(0,/2) =[3 1 + f3 2 x l , 1 = 1,..., 31, 



where Q x = 2 arctan(/?i + 2 x{) denotes the mean dir ectio n for a given dista n ce mo ved X\ (cm). These 
data have been previously analysed by IPaulal (|1996|) and ISouza and Paulal (|2002|) with emphasis on 
local influence and residual analysis, respectively. The angular responses were transformed to the 
range (— ir, tt). The maximum likelihood estimates of the parameters (asymptotic standard errors in 
parentheses) are: 0i = -0.323(0.151), 2 = -0.013(0.004) and J = 3.265(0.726). The values 
of the likelihood ratio (Si), Wald (S2), score (S3) and gradient (S4) statistics for testing the null 
hypothesis H : 2 = are 9.526 (p-value: 0.002), 1 1 .03 1 (p- value: 0.001), 7.126 (p-value: 0.008) 
and 8.280 (p- value: 0.004), respectively. At any usual significance level, all tests lead to the same 
conclusion, i.e. the null hypothesis should be rejected. 

Now, we consider different values for 2O and we wish to test Ho : 2 = 2O against Hi : 
02 7^ 020- Table \5\ lists the observed values of the different test statistics and the corresponding 
p-values for 2O = -0.026, -0.024, -0.022, -0.020 and -0.018. The asterisks indicate that the 
null hypothesis is rejected at respectively the 1% (***), the 5% (**) or at the 10% (*) significance 
level. Notice that the same decision is reached by all the tests when 2O = —0.018 but not when 
2O = —0.026, —0.024, —0.022 and —0.020. In all cases considered here, the score and gradient tests 
lead to the same conclusion. Additionally, the likelihood ratio and Wald tests display the smallest 
p- values in all cases, in accordance with their liberal behaviours observed in our simulation study. 



Table 5: Test statistics for H : 2 = 2 q against Hi : 2 ^ 2O (p- values between parentheses). 









P20 






statistic 


-0.026 


-0.024 


-0.022 


-0.020 


-0.018 


Si 


7.314(0.007)*** 


5.606 (0.018)** 


4.011 (0.045)** 


2.591 (0.107) 


1.411(0.235) 


s 2 


11.409 (0.001)*** 


8.193 (0.004)*** 


5.509(0.019)** 


3.355 (0.067)* 


1.733 (0.188) 


S3 


5.872 (0.015)** 


4.636 (0.031)** 


3.407 (0.065)* 


2.251 (0.134) 


1.249 (0.264) 


5 4 


5.728 (0.017)** 


4.611(0.032)** 


3.458 (0.063)* 


2.332 (0.127) 


1.321 (0.250) 



Notice that the sample size is n = 31, but if n were smaller, the tests could lead to different 
conclusions. To illustrate this, a randomly chosen subset of the data set with n = 10 was drawn. The 
null hypothesis to be tested is Hq : 2 = 0. The observed value of the test statistics are Si = 2.939 
(p-value: 0.086), S 2 = 2.980 (p- value: 0.084), S 3 = 2.491 (p-value: 0.114) and S 4 = 2.682 (p- value 
= 0.101). Hence, at the 10% significance level, the score and gradient tests do not reject the null 
hypothesis unlike the likelihood ratio and Wald tests, which are much more oversized than the score 
and gradient tests as evidenced by our simulation results. 
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9 Concluding remarks 



The dispersion models (DMs) extend the well-known ge neralised linear mode 
19721) and also the exponential family nonlinear models (|Cordeiro and PaulaL 



s (Ne lder and Wedderburn 



19891) . Additionally, the 



class of DMs covers a comprehensive range of non-normal distributions. In this paper, we dealt with 
the issue of performing hypothesis testing in DMs. We considered the three classic tests, likelihood 
ratio, Wald and score tests, and a recently proposed test, the gradient test. We have derived formulae 
for the asymptotic expansions up to order rT 1 ! 2 of the distribution functions of the likelihood ratio, 
Wald, score and gradient statistics, under a sequence of Pitman alternatives, for testing a subset of 
regression parameters and for testing the dispersion parameter. The formulae derived are simple to be 
used analytically to obtain closed-form expressions for these expansions in special models. Also, the 
power of all four criteria, which are equivalent to first order, were compared under specific conditions 
based on second order approximations. Additionally, we present Monte Carlo simulations in order to 
compare the finite- sample performance of these tests. From the simulation results we can conclude 
that the score and gradient tests should be preferred. Finally, we present an empirical application for 
illustrative purposes. 
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